Non-negative matrix factorization using Tensorflow (tf35)

contributed by Nipun Batra

Perform NNMF using TensorFlow using matrix with missing entries (to mimic the problem of movie recommendation). Projected gradient descent will be used for this problem. We would compute the gradient, then ensure that the weights are non-negative, then perform gradient descent...


In [20]:
# Customary imports

import tensorflow as tf
import numpy as np
import pandas as pd
np.random.seed(0)

In [7]:
# Creating the matrix to be decomposed

A_orig = np.array([[3, 4, 5, 2],
                   [4, 4, 3, 3],
                   [5, 5, 4, 4]], dtype=np.float32).T

A_orig_df = pd.DataFrame(A_orig)

In [8]:
A_orig_df #(4 users, 3 movies)


Out[8]:
0 1 2
0 3.0 4.0 5.0
1 4.0 4.0 5.0
2 5.0 3.0 4.0
3 2.0 3.0 4.0

In [21]:
# Masking some entries

A_df_masked = A_orig_df.copy()
A_df_masked.iloc[0,0]=np.NAN
np_mask = A_df_masked.notnull()
np_mask


Out[21]:
0 1 2
0 False True True
1 True True True
2 True True True
3 True True True

Basic Tensorflow setup


In [11]:
# Boolean mask for computing cost only on valid (not missing) entries
tf_mask = tf.Variable(np_mask.values)

A = tf.constant(A_df_masked.values)
shape = A_df_masked.values.shape

#latent factors
rank = 3 

# Initializing random H and W
temp_H = np.random.randn(rank, shape[1]).astype(np.float32)
temp_H = np.divide(temp_H, temp_H.max())

temp_W = np.random.randn(shape[0], rank).astype(np.float32)
temp_W = np.divide(temp_W, temp_W.max())

H =  tf.Variable(temp_H)
W = tf.Variable(temp_W)
WH = tf.matmul(W, H)

Cost function


In [12]:
#cost of Frobenius norm
cost = tf.reduce_sum(tf.pow(tf.boolean_mask(A, tf_mask) - tf.boolean_mask(WH, tf_mask), 2))

Misc. Tensorflow


In [13]:
# Learning rate
lr = 0.001
# Number of steps
steps = 1000
train_step = tf.train.GradientDescentOptimizer(lr).minimize(cost)
init = tf.global_variables_initializer()

Ensuring non-negativity


In [14]:
# Clipping operation. This ensure that W and H learnt are non-negative
clip_W = W.assign(tf.maximum(tf.zeros_like(W), W))
clip_H = H.assign(tf.maximum(tf.zeros_like(H), H))
clip = tf.group(clip_W, clip_H)

Main Tensorflow routine


In [15]:
steps = 1000
with tf.Session() as sess:
    sess.run(init)
    for i in range(steps):
        sess.run(train_step)
        sess.run(clip)
        if i%100==0:
            print("\nCost: %f" % sess.run(cost))
            print("*"*40)
    learnt_W = sess.run(W)
    learnt_H = sess.run(H)


Cost: 148.859848
****************************************

Cost: 3.930172
****************************************

Cost: 2.068570
****************************************

Cost: 1.418309
****************************************

Cost: 0.819721
****************************************

Cost: 0.399933
****************************************

Cost: 0.176080
****************************************

Cost: 0.079007
****************************************

Cost: 0.041353
****************************************

Cost: 0.027041
****************************************

Computing the prediction


In [16]:
learnt_H


Out[16]:
array([[ 0.86129224,  1.3388027 ,  1.97224879],
       [ 2.16338873,  0.97277433,  1.17212451],
       [ 0.25879648,  1.07861733,  1.09541821]], dtype=float32)

In [17]:
learnt_W


Out[17]:
array([[ 1.15797794,  0.97454673,  1.41825044],
       [ 1.44136858,  1.16967547,  0.79135358],
       [ 0.81640321,  1.98227394,  0.02636297],
       [ 1.38819814,  0.29285902,  0.8031919 ]], dtype=float32)

In [18]:
pred = np.dot(learnt_W, learnt_H)
pred_df = pd.DataFrame(pred)
pred_df.round()


Out[18]:
0 1 2
0 3.0 4.0 5.0
1 4.0 4.0 5.0
2 5.0 3.0 4.0
3 2.0 3.0 4.0

Compare with the Original


In [19]:
A_orig_df


Out[19]:
0 1 2
0 3.0 4.0 5.0
1 4.0 4.0 5.0
2 5.0 3.0 4.0
3 2.0 3.0 4.0

In [ ]: